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ABSTRACT 

A close comparison of Kepler short- and long-cadcncc data released prior to 2011 Nov 1 
has shown some subtle differences that make the short-cadence data superior to their 
long-cadence counterparts. The inevitable results of a faster sampling rate are present: 
the short-cadence data provide greater time resolution for short-lived events like flares, 
and have a much higher Nyquist frequency than the long-cadence data; however, they 
also contain fewer high- amplitude peaks at low frequency and allow a more precise 
determination of pulsation frequencies, amplitudes and phases. The latter observation 
indicates that Kepler data are not normally distributed. Moreover, a close inspection 
of the Pre-search Data Conditioned (PDC) long-cadence data show residuals that 
have increased noise on time-scales important to asteroseismology, but unimportant 
to planet searches. 



1 INTRODUCTION 



The Kepler Space Telescope is in a 372. 5-d heliocentric 
Earth-trailing orbit, collecting white-light photometric data 
for a sample of ~160 000 stars covering a field of view of 115 
deg 2 . The core goal of the mission is detection of Earth-like 
planets orbiting Sun-like stars within the habitable zone. 
The details of the miss ion goals and design a re described by 
iKoch et all (|201Ch and lBorucki et al.l (|2010h . 

Secondary to the main goal of the mission is astero- 
seismology, which can provide valuable information on the 
host stars that is important in planet characterisation. With 
transit depths, only the ratio of the planet and host star 
radii is available, but asteroseismology allows detailed anal- 
ysis of the star's interior if the star pulsates, and for solar- 
like oscilla tors can yield th e star's radius to better than 
3 per cent (IStello et alj|2009fl - even approaching 1 per cent 
m some cases (|Gilliland etafll2010a[ h Through asteroseis- 
mology, Kepler also promises significant advances in stellar 
astrophysics with the dedica tion of ~1 per cent of observa- 
tions to asteroseismic study |Gilliland et al.lfeoiOal ). 

Kepler data are available in two cadences, long (LC) 
and short (SC). Each cadence is composed of multi- 
ple 6.02-s exposures w ith associated 0.52-s readout times 
jCilliland et al.ll2010bl ). The LC data integrate over 270 ex- 
posures to give 29.4-min observations, whereas the SC data 
contain nine exposures giving one data point every 58.9 s. 
Both cadences are stored on-board and downlinked to Earth 
roughly every 32 d, introducing gaps up to ~24 h in length 
while the photometer is not collecting data. Kepler com- 
pletes one quarter of its orbit after three downlinks, and 
must then perform a quarterly roll to keep its solar pan- 
els pointing towards the Sun, and its radiator pointing to 
deep space. Kepler data are therefore organized into quar- 



ters and thirds around those rolls and downlinks. LC data 
quarters are denoted by Qn, and SC quarters by Qn.m to 
notify which third (or 'month') of that quarter the data cor- 
respond to. 

Pre-Q9, Kepler data were available in two forms: (i) 
'raw' flux, of which Simple Aperture Photometry (SAP) flux 
is the preferred nomenclature, and for which basic calibra- 
tion is performed distinguishing it from the truly raw flux, 
and (ii) Pre-search Data Conditioned (PDC) 'corrected' 
flux. The PDC data were created as a step towards facil- 
itating planetary transit searches and should be used only 
with caution in astrophysical analyses because some stel- 
lar variability can be modified in the light curves by the 
PDC pipeline, pertaining to data releases 11 and earlier. 
This is discussed in Section 5. Post-Q9, PDC has been su- 
perseded by another pipeline, PDC MAP. New quarters of 
data will contain PDC MAP rather than the old PDC fluxes, 
and older quarters of data are due to be reprocessed with 
this pipeline and made public by July 2012. 

There are many advantages of SC data over LC data, 
but hardware limitations restrict SC slot allocation to 512 
slots at any given time. Here we discuss the following advan- 
tages of SC data: increased sampling rate; higher Nyquist 
frequency; fewer low-frequency artefacts; and reduced er- 
rors on frequency, amplitude and phase determinations in 
the Fourier spectrum. We also discuss the difference in dis- 
tribution of data points between SC and LC data and look 
at the differences in the PDC and SAP-flux data. Initial 
characteristics of th e LC and SC data c an be found in 
IJenkins et all (|201Ch and ICilliland etafl |2010bl h respec- 
tively. Fo r a detailed, recent re view of Kepler noise prop- 
erties, see ICilliland et all (|201ll ) and references therein. 
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2 SAMPLING RATE 



There are 30 times more SC points than LC points in a 
quarter, arising from the longer integration time used for LC 
data. Two well-known effects this has on time-series analysis 
are the time resolution available, and the associated Nyquist 
frequency: 



2.1 Time resolution 

The primary mission goal of planet detection requires the in- 
creased sampling rate to time transits more precisely; signs 
of gravitational perturbation seen as changes in transit du- 
ration may lead to sub sequent detections of oth er planets 
orbiting the same star. IHolman fe Murray! (|2005l ) calculate 
that under the gravitational influence of other solar system 
bodies, Earth's apparent transit time for an observer viewing 
along the orbital plane would appear to decrease by around 
650 s for 2 in 10 of its orbits, depending on the relative posi- 
tion of each planet and the observer with respect to orbital 
phase. The effect is even greater for planets orbiting farther 
from their star (up to ~6000 s for Mars) , and for planets 
orbiting less massive stars. The sensitivity required for such 
detections is easily met by Kepler, where transit durations 
would vary by 11 and 101 SC points for the Earth and Mars 
cases respectively, even where errors from photon statistics 
on suc h a transit duration reach ~500s, equivalent to 8 SC 
points (|Holman fc Murray|2005t ). Indeed, the first previously 
unknown planet to be detected using th is technique, Kepler - 
19c, does not appear to transit its star jBallard et al.ll201ll i. 
but leaves a clear sinusoidal deviation of the transit times 
once the transits of Kepler-19b are subtracted out. 

The higher sampling rate is useful for astrophysical 
studies too, particularly for short-lived events like flares. 
The star KIC 12406908 is in Kepler Asteroseismic Sci- 
ence ConsortiunQ Working Group 7 (KASC WG7: Cepheid 
Variables), but is probably misclassified. It is one of the 
~15 per cent of the Kepler Input Catalogue (KIC) stars that 
have no fundamental parameters listed, that is, no T e s or 
logp values are available. Fig. [T] shows a 7.2-h sample of the 
light curve of the largest flare in Q3.1. The LC data have 
been plotted underneath the SC data for comparison, and a 
shift in magnitude has been created for demonstrative pur- 
poses only. The longer integration time in LC has the effect 
of averaging the SC points, and under-samples the 0.056 mag 
flare. The shape of the flare, including its erratic nature as 
its luminosity output rises and falls numerous times across 
the event, is lost in the LC data. Only a rough approxima- 
tion of its magnitude and duration would be determinable 
without the SC data. Events with such short time-scales can 
clearly only be studied in SC. 



2.2 Nyquist frequency 

The most important benefit of SC data to asteroseismology 
is the higher Nyquist frequency, and the exquisite quality 
of the Kepler data provide a nice opportunity to demon- 
strate this. Many asteroseismic targets pulsate at frequen- 
cies higher than the Nyquist frequency of the LC data 



http://astro.phys.au.dk/KASC/ 
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Time (BJD-24551 18.0) 

Figure 1. A large-amplitude flare on KIC 12406908. The LC data 
(red squares) are plotted beneath the SC data (blue circles) for 
comparison. The change in brightness is precise, but not accurate 
- the dimmest LC point was chosen as the zero point for the 
graph, and all SC points are offset for clarity. The SC data used 
are Q3.1 PDC flux (see Section 5 for more details on PDC flux), 
and the LC data arc simulated by averaging bins of 30 consecutive 
SC points. 



(24.469 d _1 ; 283.21 /JHz), and cannot be studied reliably due 
to aliasing problems. Solar-like oscillations and roAp star 
pulsations only occur at frequencies much higher than this. 
Straddling both sides of the LC Nyquist frequency are the 
5 Set stars. These stars pulsate in low-order p-modes (pres- 
sure modes) and typi cally have fr equencies in the range 
4-50 d _1 (46-579 uHz: iBregerl I2OO0T ). although the highest 
published frequency for a 8 Set star is currently 79.5 d _1 
(920 »Hz: lAmado et al.lliooi '). 

The Nyquist frequency is equal to half the rate at 
which a signal is being sampled. Since LC has a point ev- 
ery 29.4 min, there are 48.9 points per day, and the Nyquist 
frequency is therefore 24.5 d _1 (284 fiYLz). Hence if in the 
Fourier spectrum a signal is detected with a frequency higher 
than the Nyquist frequency, it is not fully sampled, and an 
alias will be detected at 2/Nyquist — /signal- It is not always 
obvious that these detected frequencies are 'reflections' of 
frequencies higher than /Nyquist, and can sometimes be in- 
terpreted as real pulsation frequencies. 

One is naturally cautious when any frequencies are de- 
tected in LC near the Nyquist frequency, as the star could 
have pulsation frequencies above the Nyquist frequency even 
if the detected frequencies are real and not reflections. How- 
ever, for much lower frequencies the possibility of a reflection 
seems more remote. 

KIC 10977859 is a <5 Set star in which the SC data show 
only high-order p-modes, and a lack of pulsations at frequen- 
cies below the LC Nyquist frequency (Fig. [2] upper panel). 
When only the LC data are considered (if SC data were 
not available, for instance), the spectrum looks entirely dif- 
ferent. Nyquist frequency limitations mean that in LC data 
the true frequencies of high-order p-modes would not be dis- 
cernible, but a huge number of peaks are visible below the 
LC Nyquist frequency instead (Fig.[2] lower panel), strongly 
implying the star pulsates in low-order p-modes and maybe 
g-modes (gravity modes), too. What is even more misleading 
in this case, and makes the situation problematic, is the ab- 
sence of peaks in the periodogram of the LC data between 16 
and 24.4 d _1 (185-282 ^iHz), fooling the observer into believ- 
ing these are low-order p-modes with only a small likelihood 
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Figure 2. Upper panel: the Fourier spectrum of the 8 Set star KIC 10977859 for Ql, showing only high-order, high-frequency p-modes, 
and a lack of pulsations at lower frequencies. Lower panels: A magnification of the spectrum between and 16 d — 1 for the same star, in 
SC (blue, middle) and LC (red, bottom). The amplitude of the real peaks in the top panel are up to five times greater than corresponding 
peaks in the bottom panel, indicating a significant amplitude red uction in reflected pea ks. No points were removed from these SAP-flux 
data, which were analysed using the statistical package Period04 ((Lcnz & Brcgcr 20041). 



of signals at higher frequencies - there is no warning of what 
lies beyond the LC Nyquist frequency. 

One must exercise extreme caution when analysing LC 
data if there are no SC data to test for aliasing problems 
associated with the LC Nyquist frequency. 



3 LOW FREQUENCY PEAKS 

Another benefit of SC data over LC is the reduced number 
of high-amplitude peaks at low frequency. Such peaks can 
arise naturally from long time-scale processes such as differ- 
ential velocity aberration, with stars moving across the CCD 
by up to 1.5 pixels, which results in a different flux fraction 
being captured by the CCD, and the amount of b ackground 
contaminating light changing. ICarcfa et al.l (|201ll ) cite CCD 
degradation as a cause of long time-scale drifts too, but CCD 
degradation is likely to arise from high-energy cosmic ray im- 
pacts and will often, as a result, be more of a step-function 



than a long-term trend. SAP-flux data do contain strong in- 
strumental trends that dominate at low frequency, rendering 
the difference in prevalence of low-frequency peaks between 
the two cadences insignificant in SAP-flux. However, the LC 
PDC flux data, from which instrumental trends have mostly 
been removed, still contain some relatively high-amplitude 
peaks at low frequency. 

In order to demonstrate this phenomenon, a nearly con- 
stant star was selected to minimize the number of peaks seen 
as a result of pulsations. The nature of any frequency peaks 
in the Fourier transform depends greatly on the length of 
the dataset. For one of the stars analysed, KIC 9390100, 
the LC Q2 data span 88.9 d, but the SC Q2.2 data only 
span 30.0 d - the star was not observed in SC during Q2.1 
and Q2.3. It was therefore necessary to truncate the length 
of the LC data to that of the SC Q2.2 data. Regardless 
of cadence, each quarter is divided into thirds by Kepler's 
downlinking process, thereby the cadences corresponding to 
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Figure 3. The PDC SC data (blue) are plotted on top of the PDC LC data (red) of Q2.2 for the near-constant star KIC 9390100 for 
frequencies 0-2 d~ J . Plotted in black are the LC data created from merging the PDC SC data. The blue and black lines almost completely 
overlap, but the displacement of the red line indicates the extent of artificial low-frequency peaks introduced by the PDC data pipeline. 



Q2.1 and Q2.3 in the LC data were easily removed. No out- 
liers were removed from either dataset. A third dataset was 
made for comparison by binning sets of 30 consecutive SC 
points and replacing them with a single point; the fluxes of 
the SC points were added and their times averaged. This 
'merging' process creates points that are entirely concurrent 
with the LC data (for every LC point there is a merged 
point at exactly the same time), and the merged and Kepler 
LC datasets shown in Fig. [3] contain exactly the same num- 
ber of points. To achieve this, two more points had to be 
removed from the LC dataset, at BJD = 2455042.3243 and 
2455051.4377, because undefined SC flux values in these bins 
did not allow accurate merged data points to be created, but 
Kepler LC points existed at those times. The 29x2 SC data 
points corresponding to each of these LC points were also 
removed for a fair comparison in this PDC-data example. 

At lower frequencies (below 2d -1 , and especially below 
0.4 d -1 ) the SC and LC spectra are significantly different 
(Fig.[3|. There are a few coherent peaks common to both 
cadences, but the LC data have many more high-amplitud ^3 
peaks at those frequencies under 2d~ . This difference be- 
tween SC and LC Kepler data is often encountered, and 
requires treating before frequency analysis. 

At higher frequencies (above 2d~ ) the two amplitude 
spectra are almost the same, and can be seen converging in 
the figure. The merged data, which one expects to be simi- 
lar to the LC data because they are created by integrating 
fluxes over the same cadence-numbers, mimic the SC data 
very well over the entire frequency range. Slight differences 
can be accounted for by considering that there are occasion- 
ally SC data points next to data gaps that do not get incor- 
porated into merged data, because the merged data must be 
concurrent with the LC data for a direct comparison. 

The Kepler LC data should not behave exactly like 

2 relatively speaking: amplitudes of 10~ 5 mag are normally con- 
sidered tiny in the analyses of S Set stars! 



the merged data, because LC and SC data go through 
a different calibration process (involving such methods as 
dark/bias subtraction and flat-field removal), but the scale 
of the difference suggests that discrepancies do remain af- 
ter the PDC-correction procedure. These discrepancies are 
not seen to the same extent in the SAP-flux data; the dif- 
ference in peak heights in the PDC example in Fig. [3] is 
12 /imag (400 per cent) compared with 6 /miag in SAP-flux 
(2 per cent), noting that instrumental trends dominate in 
SAP-flux. That some discrepancies remain in the PDC data 
is not surprising, as they were designed to facilitate planet- 
finding, not asteroseismology. While the residual peaks con- 
tribute significantly to the noise on time-scales important to 
asteroseis mology, planeta r y tra nsit searches are not greatly 
affected. IGilliland et al.l (|201ll ) compared noise on 6.5-h 
time-scales, chosen to be representative of planet transit du- 
rations; this corresponds to a frequency of 3.7 d _1 , but all 
three lines in Fig.[3]are already converging at 2d" 1 . In fact, 
a planet orbiting a sun-like star and having a transit time 
of 1 d - a time-scale where the discrepancies are an impor- 
tant source of excess noise - would have a semi-major axis 
of 3.4 AU and a period of 6.3 y. The detection and confirma- 
tion of such a planet is much beyond the design capabilities 
of Kepler unless the mission competes successfully for an 
extension. 

In addition to contributions to the total noise level, the 
low-frequency artefacts can cause other problems, an ex- 
ample of which is in automatic frequency extraction proce- 
dures that select the highest amplitude peak or peaks in the 
Fourier spectrum to classify a star. If the spectrum is dom- 
inated by non-astrophysical low frequency peaks, then false 
classification may occur and lead to incorrect statistics on 
both dominant frequencies and amplitudes. 

A solution to the problem is on the way. PDC MAP 
sees a ~10-20 per cent improvem ent in signal-to-noise o n the 
6.5-h time-scales reported on in IGilliland et"afl l|201ll ) (Jon 
Jenkins, priv. comm.), and supports their conclusions that 
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Figure 4. The Fourier transform of the LC Q3.2 (truncated) 
data for KIC 3437940 is plotted at the bottom in blue, and the 
difference between the amplitudes of the SC and LC data for the 
same month is plotted above in red, showing that the SC data 
have slightly higher-amplitude peaks due to the shorter integra- 
tion time and the averaging effect inherent within LC data. Notice 
the change in scale on the vertical axis by a factor of 10 between 
the two plots. 



the major contributor to the observed excess noise still orig- 
inates in the stars themselves. 



4 PEAK WIDTHS, AMPLITUDES AND 
ERRORS 

The width of a peak in the Fourier spectrum of a dataset 
of length T can be approximated by 1 /T (the Rayleigh cri- 
terion). For datasets of equal length, the greater number of 
points in SC data has no effect on the width of the peak, 
but the amplitude of peaks is different. As mentioned in Sec- 
tion 2, the longer integration time of the LC data causes an 
averaging effect that is more noticeable for shorter-period 
events or pulsations. The same effect reduces the amplitude 
of peaks in the periodogram of LC data, as shown in Fig. [4] 
The percentage difference in amplitude between SC and LC 
increases with higher frequencies. 

This effect can be explained mathematically. For a 
Fourier peak with true amplitude Aq, it can be shown (see 
online supplementary material) that the observed ampli- 
tude, A, is described by the equation 



A 



sin7r/n 
Tv/n 



A 



(1) 



for n points per cycle. Using KIC 3437940 (from Fig.[4j as 
a numerical example, the ratio of the observed amplitude in 
SC to that of LC, Asc/Alc, is 1.08 for the peak at 10.5 d" 1 , 
but Asc/Alc increases to 1.34 for a hypothetical peak at 
20d" 1 . 

KIC 10977859 (Fig.HJ demonstrates the same effect for 
high-frequency pulsations that are reflected in the Nyquist 
frequency. The highest peak in the bottom panel is located 
in frequency just where one expects from the reflection of the 
highest peak in the top panel (to well within one FWHM), 



Table 1. The formal least-squares errors on frequency, amplitude, 
and phase for Q3.2 SC and LC data of the star KIC 3437940, with 
the LC errors being greater in each case by factors of ~5. The 
least-squares errors were calculated with Pcriod04. 



cadence 


frequency error 


amplitude error 


phase error 




xl0~ 5 d- 1 


/imag 


Xl0~ 3 rad 


SC PDC 


4.9 


14 


0.4 


LC PDC 


25.8 


70 


2.2 


SC SAP 


4.6 


13 


0.4 


LC SAP 


25.8 


70 


2.2 



and has an amplitude reduction in agreement with equa- 
tion |l]) to within the least-squares errors. One can there- 
fore do asteroseismology on p-modes above the LC Nyquist 
frequency using LC data, providing at least one month of 
SC data is available to overcome the aliasing problem. 

Having a greater number of points allows a more pre- 
cise determination of pulsation frequencies, amplitudes and 
phases. To give a quantitative example, one month of LC 
data was compared to SC data for the star KIC 3437940 and 
points were only removed to truncate the LC Q3 dataset 
to exactly the same time-span as the SC Q3.2 dataset - 
as was done in Section 3 for KIC 9390100. The errors on 
frequency, amplitude and phase for the different cadences 
are summarised in Table 1, and are on the order of 5 times 
greater for the LC data. The result applies to both SAP and 
PDC flux, indicating the precision difference is not a result 
of the greater variance at low frequency presented in Fig. [3] 
This is an important distinction pointing to greater qual- 
ity of the S C data, and implies t he data are not normally 
distributed. iDegroote et "a l. (2009) found a similar result for 
CoRoT noise properties. 

The scatter of points in SC data is greater than that of 
LC data. There are 30 times more points in SC data, so the 
scatter of points is expected to be \/30 times greater if the 
noise is assumed to be whit^E This is particularly noticeable 
in those stars that are approximately constant. In those that 
pulsate, one has to be careful when discarding those points 
that appear to be outliers - there should be more outliers 
in SC because of the greater number of points, but sam- 
pling the brightness variations more often produces higher- 
amplitude peaks in both the light curve and the Fourier 
transform. If one 'sigma clips' the data too closely from the 
beginning, the extrema of those peaks in the light curve 
might be lost. Clipping at 3 a is too tight - if the data were 
normally distributed, one discards 1 in 200 points that natu- 
rally belong to the distribution in this manner; these points 
may lie further from the mean or fit, but are not necessar- 
ily erroneous outliers. A discussion of the validity of sigma 
cli pping and other outl i er rem oval procedures can be found 
in (Hogg. Bow fc Land (|2010T l. 



J The noise is not white, but this serves as a useful approxima- 
tion. 
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5 PDC VS SAP-FLUX 

Data-files downloaded through either the Kepler Asteroseis- 
mic Science Operations Centre (KASOC0) or the NASA 
multimission archiv^EI contain times of observations, SAP- 
flux, PDC flux (or PDC MAP, depending on data release), 
and the errors on those fluxes. The SAP-flux data show in- 
strumental trends, but the PDC flux light curves correct 
some of these. In bot h cases, bad cadenc es are flagged in the 
.fits data-file format. [Garcia et alj (|201lT ) discuss the process 
of correcting Kepler light curves in more detail and specif- 
ically its application to asteroseismic analysis, even creat- 
ing their own, separate, pipeline. A more thorough discus- 
sion of the PDC pipeline with more general applications 
can be found in the Kepler Data Characteristics HandboolQ 
^Christiansen fc van Clevell201ll ). Here, we focus on a few 
common issues, not including the information concerning 
processing of LC data that was presented in Section 3. 



5.1 Instrumental effects 

The most significant effects on Kepler light curves are dif- 
ferential velocity aberration (discussed in Section 3); loss of 
fine pointing, which leads to small gaps in the light curves in 
both SAP and PDC flux; reaction wheel zero-crossings, dur- 
ing which the spacecraft shakes for a day or so (not corrected 
in PDC data); cosmic ray events, which cause a step-function 
change in the flux level, which decays exponentially back to 
90-100 per cent of the original level (corrected in PDC data); 
monthly Earth downlinks, identifiable by a gap in the data 
of up to ~24h followed by an exponential increase/decrease 
in flux level as the telescope returns to science operating fo- 
cus and temperature (corrected in PDC data); and attitude 
tweaks, which have thus far occurred only twice in the entire 
mission, during science operations in Q2, and are no longer 
expected to be a problem (not fully corrected in PDC data). 
Further details on these effects can be found in the Kepler 
Data Characteristics Handbook, which also keeps track of 
lists of known spurious peaks belonging to the SC and LC 
data. 



5.2 Optimal aperture corrections 

In SAP and PDC flux data alike, the amount of flux in 
the defined aperture changes from quarter to quarter, and 
discontinuities in flux are seen as a result. The PDC data 
contain corrections for both: a) the amount of light con- 
tributed from the intended target and not contaminating 
stars: the median flux over a month or quarter is multiplied 
by (1 — contamination) and subtracted from each cadence; 
and b) the fraction of flux not captured by the 'optimal aper- 
ture', which is defined to maximise signal-to-noise rather 
than to capture all light from the star. The SAP-flux data 
contain no such corrections. 



4 http://kasoc.phys.au.dk/ 

5 http://archivc.stsci.edu/koplcr/ 

6 http: //archive. stsci.edu/kepler/manuals/ 
Data_Charactcristics_Handbook_20110201.pdf 



5.3 Effect on data analysis 

Whilst analysing the effect of cadence on the prevalence of 
low-frequency peaks in the periodogram, the PDC flux data 
were compared with the SAP-flux data. Fig.[5]illustrates the 
pervasiveness of low-frequency peaks in the SAP data com- 
pared with the much flatter PDC flux, for the near-constant 
star KIC 7450391, using the full Q2 LC dataset and not 
removing outliers. Not only are the SAP-flux data visibly 
poorer in this regard, but the amplitude of the PDC flux 
curve needed artificial amplification by a factor of ten, sim- 
ply to make it visible on these axes. This puts some per- 
spective on the problematic LC PDC flux data, in that the 
SAP-flux data are dominated much more substantially by 
artificial peaks than are the PDC flux data. 



6 CONCLUSION 

The SC data are almost always better than the LC data. 
The SC slot availability is the major limitation one faces 
when using and obtaining SC data. We have seen the neces- 
sity of the increased sampling rate of SC data for resolving 
short time-scale events such as flares, and for precise transit 
timing. 

The Nyquist frequency of LC data can be problematic 
and certainly limiting in asteroseismic analysis. Pulsations 
detected near the Nyquist frequency can be indicative of 
higher-frequency pulsations that require SC data, and also 
that peaks in the periodogram may be reflections of peaks 
from above the Nyquist frequency. However, the lack of pul- 
sations near the Nyquist frequency cannot rule out the pres- 
ence of higher-order p-modes, and SC data may still be re- 
quired. When SC data are not available to check for frequen- 
cies above the LC Nyquist frequency, one must be aware 
that observed signals could be reflections from beyond the 
LC Nyquist frequency. 

There is an amplitude difference between SC and LC 
data associated with the longer integration time of LC data. 
The result is that peaks in the periodogram have slightly 
higher amplitudes in SC data, and the percentage difference 
between the two cadences grows with increasing frequency. 

Effects that are not directly concerned with the different 
sampling rates have also been observed. LC PDC flux data 
often contain spurious peaks of non-astrophysical origin at 
very low frequency that are not always present with similar 
amplitudes in SC PDC data. This can affect studies of long- 
period brightness variations arising from such things as spots 
on slow rotators. Automatic frequency extraction can suffer 
from these artificial peaks. 

The greater number of points in SC data does not pro- 
duce narrower peaks in the periodogram and cannot improve 
resolution of two closely-spaced frequencies. Frequencies can 
be determined with greater precision though, as can their 
corresponding amplitudes and phases. For this to be true, 
the data cannot be normally distributed. 

The usefulness of PDC flux data was also discussed. 
Noise in the periodogram can be vastly reduced by analysing 
the PDC flux data instead of SAP-flux data. The PDC 
flux data have fewer drifts, jumps and outliers, generat- 
ing cleaner light curves and Fourier spectra, but may also 
modify astrophysical signals. It is therefore recommended 
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Figure 5. The PDC-flux data (blue) have been plotted over the SAP-flux data (red) for the Kepler object KIC 7450391 during Q2. No 
points were removed from either dataset. The low-frequency peaks in PDC LC data mentioned in Section 3 are present but insignificant 
by comparison to the difference between PDC and SAP data. 



to cross-examine results obtained with PDC flux data with 
those from SAP-flux data, as advised in the Prefatory Ad- 
monition of the Data Characteristics Handbook, to check 
that genuine pulsation frequencies have not been missed as 
a result of accidental removal in the data processing pipeline. 
The PDC flux data do not necessarily remove all jumps 
and outliers, so it is still recommended to manually check 
light curves for such artefacts prior to analysis. Moreover, 
it is recommended that investigators analyse the subset of 
data points that have a quality flag of zero, meaning the 
cadences are 'good'. Of the observations made during Ke- 
pler's 92 per cent duty cycle, ~95 per cent of data points in 
'well-behaved' quarters might be described as 'good' for a 
typical 13 th magnitude star. Data generated in this method 
will supersede the present PDC flux data for asteroseismic 
analysis. 

Finally, the PDC pipeline leaves artificial peaks in the 
LC data. There are relatively high-amplitude peaks at low 
frequency (< 2d -1 ) in the Fourier transform of the LC PDC 
flux data. Such time-scales are important to asteroseismol- 
ogy, but unimportant for the transits of potentially hab- 
itable planets. Never-the-less, improvement is expected in 
data processed in Quarter 9 Release 12 and subsequent re- 
leases, through the PDC MAP pipeline. 



RAS, 352, Lll 
Ballard S. et al., 2011, ApJ, 743, 200 
Borucki W. J. et al., 2010, Science, 327, 977 
Breger M., 2000, in Astronomical Society of the Pacific 
Conference Series, Vol. 210, Delta Scuti and Related Stars, 
M. Breger & M. Montgomery, ed., pp. 3 — h 
Christiansen J., van Cleve J. E., 2011, Kepler Data Char- 
acteristics Handbook 
Degroote P. et al., 2009, A&A, 506, 111 
Garcia R. A. et al., 2011, MNRAS, 414, L6 
Gilliland R. L. et al., 2010a, PASP, 122, 131 
— , 2011, ApJS, 197, 6 
— , 2010b, ApJ, 713, L160 

Hogg D. W., Bovy J., Lang D., 2010, ArXiv e-prints 

Holman M. J., Murray N. W., 2005, Science, 307, 1288 

Jenkins J. M. et al., 2010, ApJ, 713, L120 

Koch D. G. et al., 2010, ApJ, 713, L79 

Lenz P., Breger M., 2004, in IAU Symposium, Vol. 224, The 

A-Star Puzzle, J. Zverko, J. Ziznovsky, S. J. Adelman, & 

W. W. Weiss, ed., pp. 786-790 
Stello D. et al., 2009, ApJ, 700, 1589 



I would like to thank Don Kurtz for advice and discus- 
sions and acknowledge the financial support of the STFC 
via the PhD studentship programme. 



REFERENCES 

Amado P. J., Moya A., Suarez J. C, Martin- Ruiz S., Gar- 
rido R., Rodriguez E., Catala C, Goupil M. J., 2004, MN- 



© 0000 RAS, MNRAS 000, 000-000 



